Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration #110

Merged
merged 78 commits into from
Dec 6, 2024

Conversation

RekGRpth
Copy link
Member

@RekGRpth RekGRpth commented Oct 29, 2024

Make gprestore --resize-cluster use --jobs for parallel restoration

Users consider the current behavior to be incorrect, since the --jobs parameter
is used when creating a backup, which is not actually used during restoration.
Restoration works in a single instance of the gpbackup_helper agent (in the
--restore-agent mode).

In short, when restoring a backup using the gprestore utility with the
--resize-cluster option, which was previously made in parallel creation mode
(--jobs), the utility does not use the corresponding mode during restoration.
In this case, restoration occurs through gpbackup_helper, which is launched in
one instance of this process for each segment. Thus, one table is processed at
a time based on the list of table OIDs passed when gpbackup_helper is launched.

As a solution to the problem, this patch implements the launch of several
gpbackup_helper instances based on the value passed in the --jobs argument.

To do this:

  1. The general list of table OIDs for restoration is distributed between
    several instances of gpbackup_helper processes according to the specified value
    of the --jobs parameter. Also this list is distributed between goroutines in
    main process by same way.

  2. Since the general list of table OIDs for restoration is sorted by size, the
    distribution among gpbackup_helper instances and among goroutines is done
    according to the principle of sequentially distributing one table OID to each
    instance and goroutine. Thus, the load is expected to be more uniform. All
    batches corresponding to the same table OID go to the same gpbackup_helper
    instance and same goroutine.

  3. An integer identifier of the "instance number" is added to the file name of
    the list of table OIDs for restoration by gpbackup_helper instances. They are
    not logically related in any way, the main thing is that its own list is passed
    as an argument when starting gpbackup_helper.

  4. The instance number is also added to the name of the script file
    (scriptFile).

  5. The instance number is also added to the name of the pipeFile file, error
    file and skip file.

  6. To create the initial list of pipes, one pipe is created for the first table
    OID (according to the sorting of table OIDs by decreasing table size), which
    will be processed by each instance of gpbackup_helper. Thus, at the time of
    starting each instance of gpbackup_helper, the file system will already have
    the first pipe corresponding to the first table OID for this instance.

  7. When starting gpbackup_helper, the value of the --jobs parameter is taken
    into account (in fact, the number of connections in the pool). The number of
    launched instances corresponds to the value of the parameter. The value of the
    path of the first pipe (pipeFile), the path to the agent startup script
    (scriptFile), and the path to the list of table OIDs corresponding to this
    instance are passed as arguments.

  8. When deleting auxiliary files in DoCleanup, the files with the list of table
    OIDs and the startup script files (scriptFile) are deleted.

  9. For --single-data-file and --copy-queue-size modes, the current behavior
    remains unchanged.

For more convenient management, all auxiliary files have been renamed. The
process pid is placed before the suffix. The word pipe has been removed from
the names of error and skip files.

New tests have been added and old ones have been adapted.


It is easier to view the changes with the "Hide whitespace" option enabled.

@RekGRpth RekGRpth changed the title ADBDEV-6599: Parallelize of resize restore ADBDEV-6599: Make gprestore --resize-cluster use --jobs for parallel restoration Oct 29, 2024
@RekGRpth RekGRpth marked this pull request as ready for review October 29, 2024 10:13
restore/data.go Outdated Show resolved Hide resolved
whitehawk
whitehawk previously approved these changes Oct 30, 2024
@silent-observer
Copy link

The new test has failed, please look into this

@RekGRpth

This comment was marked as resolved.

@RekGRpth RekGRpth marked this pull request as ready for review December 5, 2024 05:30
filepath/filepath.go Show resolved Hide resolved
helper/helper.go Outdated Show resolved Hide resolved
filepath/filepath.go Outdated Show resolved Hide resolved
@whitehawk

This comment was marked as resolved.

@RekGRpth
Copy link
Member Author

RekGRpth commented Dec 6, 2024

Shouldn't this error file name be updated as well?

yes, added

@RekGRpth RekGRpth merged commit 43624bc into master Dec 6, 2024
2 checks passed
@RekGRpth RekGRpth deleted the ADBDEV-6599 branch December 6, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants